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(57) Abstract 

A telecommunications testing apparatus comprising a signal generator (7) which generates a speech-like synthetic sicnal 
which is supplied to the input of a telecommunication apparatus (1) to be tested. The distorted oupput of the telecommunications 
apparatus (1) is supplied to an analysis means (8), which derives, for both the undistorted test signal and the distorted signal from 
the telecommunications apparatus (1), a measure of the excitation of the human auditory system generated by both sienals takine 
into account both spectral masking and temporal masking phenomena. The difference between the two exchations if then calcu- 
lated, and a measure of the loudness of the difference is derived which is found to indicate to a high degree of accuracv the hu- 
man subjective response to the distortion introduced by the telecommunications system. 
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METHOD AND APPARATUS FOR OBJECTIVE SPEECH QUALITY MEASUREMENTS 
OF TELECOMMUNICATION EQUIPMENT 

- This invention relates to a method and apparatus 
for testing telecommunications apparatus. 

In testing telecommunications apparatus (for 
5 example, a telephone line, a telephone network, or 

communications apparatus such as a coder) a test 
signal is introduced to the input of the 
telecommunications apparatus, and some test is applied 
to the resulting output of the apparatus. It is known 

10 to derive "objective" test measurements, such as the 

signal to noise ratio, which can be calculated by 
automatic processing apparatus. It is also known to 
apply "subjective" tests, in which a human listener 
listens to the output of . the telecommunications 

15 apparatus, and gives an opinion as to the quality of 

the output. 

Some elements of telecommunications systems are 
linear. Accordingly, it is possible to apply simple 
artificial test signals, such as discrete frequency 

20 sine waves, swept sine signals or chirp signals, 

random or pseudo random noise signals, or impulses. 
The output signal can then be analyzed using, for 
example, Fast Fourier Transform (FFT) or some other 
spectral analysis technique. One cr more such simple 

25 test signals are sufficient to characterise the 
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behaviour of a linear system. 

On the other hand, modern telecommunications 
systems include an increasing number of elements which 
are nonlinear and/or time variant. Fcr example, modern 
5 low bit -rate digital speech coders, forming part of 

mobile telephone systems, have a nonlinear response 
and automatic gain controls (AGCs) , voice activity 
detectors (VADs) and associated voice switches, and 
burst errors contribute time variations to 

10 telecommunications systems of which they form part . 

Accordingly, it is increasingly less possible to use 
simple test methods developed for linear systems to 
derive objective measure of the distortion or 
acceptability of telecommunications apparatus . 

15 On the other hand, subjective testing by using 

human listeners is expensive, time-consuming, 
difficult to perform, . and inconsistent. However, 
despite these problems the low correlation between 
objective measures of system performance or distortion 

20 and the subjective response of a human user of the 

system means that such subjective testing remains the 
best way of testing telecommunications apparatus. 

Recently in the paper "Measuring the Quality of 
Audio Devices" by John G. 3eerends and Jan A. 

25 Stemerdirik, presented at the 90th AES Convention, 1991 

Februarv 19-22, Paris, printed in AES Preprints as 
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Preprint 30 70 (L-8) by the Audio Engineering Society, 
it has been proposed to measure the quality of a 
speech coder for digital mobile radio by using, as 
test signals; a database of real recorded speech and 
5 analyzing the corresponding output of the coder using 

a perceptual analysis method designed to correspond in 
some aspects to the processes which are thought to 
occur in the human ear. 

It has also been proposed (for example in 
10 "Objective Measurement Method for Estimating Speech 

Quality of Low Bit Rate Speech Coding" , Irii, 
Kurashima, Kitawaki and Itoh, NTT Review, Vol 3. No. 
5 September 1991) to use an artificial voice signal 
(i.e. a signal which is similar in a spectral sense to 
15 the human voice, but which does not convey any 

intelligence) in conjunction with a conventional 
distortion analysis measure such as the cepstral 
distance (CD) measure, to measure the performance of 
telecommunications apparatus . 
20 It would appear obvious, when testing apparatus 

such as a coder which is designed to encode human 
speech, and when employing an analysis method based on 
the human ear, to use real human speech samples as was 
proposed in the above paper by Bee rends and 
25 Sterner dink . In fact, however, the performance of such 

test systems is net particularly good. 
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Accordingly, it is an object cf the invention to 
provide an improved telecommunications testing 
apparatus and method. It is another object of the 
invention to provide a telecommunications testing 
apparatus which can provide a measure of the 
performance of telecommunications system which matches 
the subjective" human perception of the performance of 
the system. 

The present invention provides telecommunications 
testing apparatus comprising a signal generator for 
supplying a" test signal which has a spectral 
resemblance to human speech but does not correspond to 
a single speaker conveying intelligent content, and 
analysis means for receiving a distorted signal which 
corresponds to said test signal when distorted by 
telecommunications apparatus to be tested, and for 
analyzing said distorted signal to generate a 
distortion perception measure which indicates the 
extent to which the distortion of said signal will be 
perceptible to a human listener. 

Other aspect and preferred embodiments of the 
invention will be apparent from the following 
description and claims. 

The invention will now be illustrated, by way of 
example only, with reference zz the accompanying 
drawings in which: 
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Figure 1 is a block diagram showing the 
arrangement of an embodiment: of the invention in use; 

Figure 2 is a block diagram showing in greater 
detail the components of an embodiment of the 
5 invention; 

Figure 3 is a block diagram showing in greater 
detail a test signal generator forming part of the 
embodiment of Figure 2; 

Figure 4 shows schematically the structure of a 
10 test signal over time; 

Figure "5a is a graph of the level of masked noise 
(dBs) against a pitch (e.g. approximately logarithmic 
frequency) axis in critical band rate (Bark) units, 
for different levels of masking noise; and L 
15 Figure 5b is a diagram showing the variation of 

excitation threshold on a pitch (approximately 
logarithmic frequency) axis in critical band rate 
(Bark) units , for masking noise at seven given 
frequencies ; 

20 Figure S is a block diagram showing in greater 

detail an analysis unit forming part of the embodiment 
of Figure 2; 

Figures 7a and 7b form a flow diagram indicating 
schematically the operation cf the analysis unit in 
25 the embodiment cf Figure £ ; 

Figure 8a shows schematically an estimate formed 
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in this embodiment of amplitude of excitation, as a 
function of time and pitch, which would be produced in 
the human ear by a predetermined speech-like signal; 
and 

5 Figure 8b is a corresponding plot showing the 

excitation which would be produced by two spaced 
clicks; 

Figure 9 is a plot of weighting values against 
frequency for converting amplitude to perceived 
10 loudness in this embodiment; 

Figure 10 is an exemplary plot of error loudness 
values for successive time segments calculated by the 
analysis means according to Figure 7; 

Figure 11 corresponds to a modified portion of 
15 Figure 7b in a further embodiment of the invention; 

Figure 12 is a diagram of distortion amplitude 
over pitch and time axes representing a low magnitude 
nonlinear distortion of the speech signal depicted in 
Figure 8a ; and 

2 0 Figure 12 is a plot of perceived error loudness 

derived from Figure lla and corresponding in form to 
Figure 10; 

Figure 13 a corresponds zo Figure 12 bun with 
higher amplitude nonlinear distortion; and 
25 Figure 13b likewise corresponds to Figure 12; 

Figure 14 corresponds to Figure 12 but with the 
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substitution of MNRU distortion; and 

Figure 14b is a corresponding plot of error 
loudness over time; 

Figure 15a corresponds to Figure 11a but with the 
substitution of crossover distortion; and 

Figure 15b is a corresponding plot of error 
loudness over time; L ' 

Figure 16a corresponds to Figure 12 but with the 
substitution of clipping distortion due to a voice 
activity detector; and 

Figure" 16b is a corresponding plot of error 
loudness over time. 

Overview of Apparatus 

Referring to Figure 1, telecommunications 
apparatus 1 comprises an input port 2 and an output 
port 3. Test apparatus 4 comprises an output port 5 
for coupling to the input port 2 of the 
telecommunications apparatus under test, and an input 
port 6 for coupling to the output port 3 of the 
telecommunications apparatus under test. 

Referring to Figure 2, the test apparatus -4 
comprises a test signal generator 7 coupled to the 
cutput pert 5, for supplying a speech- like test signal 
-hereto, and a signal analyzer unit 3 coupled to the 
input pert 6 for analyzing the signal received from 
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the telecommunications apparatus 1. As will be 
discussed in greater detail below, the analyzer 8 also 
utilises an analysis of the test signal generated by 
the test signal generator 7 , and this is indicated in 
5 this embodiment by a path 9 running from the output 

port 5 to the input port 6 . 

Also provided from the analysis unit 8 is a 
measurement signal output port 10 at which a signal 
indicating some measure of the acceptability of the 
10 telecommunications apparatus (for example, distortion) 

is provided either for subsequent processing, or for 
display on a visual display unit (VDU) , not shown. 

First Embodiment 

Speech Sicma; Generation 

15 In its simplest form, the artificial speech 

generator may merely comprise a digital store 71 
(e.g. a hard disc or digital audio tape) containing 
stored digital data from which a speech signal can be 
reconstituted. The stored date may be individual 

20 digitaiised speech samples, which are supplied in 

succession from the store 71 to a signal 
reconstituting means 72 (e.g. a digital to analog 
converter (DAC! ) connected to the output port 5. The 
sample data stored on the store 71 comprises one or 

25 more sneech utterances lasting several seconds in 
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length (for example, on the order of ten seconds) . 

Alternatively, the store 71 may store speech data 
in the form of filter coefficients to drive an LPC 
speech synthesizer, for example, or higher level data 
5 (e.g. phoneme, pitch and intensity data) to drive a 

phoneme synthesizer comprising the reconstituting 
means . 

A control circuit 73 (e.g. a microprocessor) 
controls the operation of the store unit 71 to select 
10 a particular test signal to be output. 

Referring to Figure 4, the test signal data 
stored in the store 71 is reconstituted to form a test 
signal comprising a plurality of segments t 0 , t 1# t 2 
. . . t n . 

15 Each of the segments t Q - t n typically corresponds 

to a different speech sound (e.g. a different phoneme) 
or to silence. Cne known artificial voice test signal 
is disclosed in CCITT Recommendation P50 
(Recommendation on Artificial Voices, Vol. Rec P50, 

20 Melbourne 1988, published by CCITT) . In the P50 test 

signal, each segment lasts 60ms. 

The segments are grouped into patterns each 
comprising a randomly selected sequence of 16 
predetermined spectral patterns, defined by the 

25 recommendation, with spectrum densities S^(f) equal to 
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Spectrum density S ± (f) = 1 , i = 1,2, ...16 

12 

A,j + 2 L Aij [cos(27rif)] 
3 = 1 

5 The transition between the different segments in 

each pattern is arranged tc be smooth. Of the 
patterns, 13 correspond to voiced speech and the 
remaining 3 to unvoiced speech. A sequence of speech 
can either be stored on a recording medium and 

10 reproduced, or can be generated from stored data using 

a vocoder as described in the above referenced Irii 
paper, for example. 

The P50 signal has a long term and short term 
spectral similarity to speech when averaged over about 

15 10 seconds. Accordingly, preferably, the speech 

sequence shown in Figure 4 lasts at least this long. 
Distortion 

The signal leaving the telecommunications 
apparatus 1 under test differs from the test signal 
20 supplied to the input port 2. Firstly, there will be 

time- invariant linear distortions of the signal, 
. resulting in overall changes of amplitude, and in 
filtering of the signal so as tc change its spectral 
shape. Secondly, noise will be added to the signal 
25 from various sources, including constant noise sources 

{such as thermal noise) and discontinuous sources 
(such as noise bursts, dialling pulses, interference 
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spikes and crossed lines) . Thirdly, there will be 
nonlinear and time -varying distortions of the signal. 
due_ to nonlinear, elements such as codecs and time- 
varying elements such as echo cancellers and 
5 thresholders . 

The presence of nonlinear distortion can cause 

intermoduiation between noise and the signal, and the 
distortion at the output port 3 therefore depends not 
only upon the signal and the apparatus 1 but: also the 

10 noise. Further, the presence of time- varying 

distortion means that the distortion applied to any 
given temporal portion of the signal depends upon 
preceding temporal portions of the signal and noise; 
for instance, if high level noise is present before 

15 the beginning of a phoneme, a voice activity detector 

may not clip the phoneme at all, whereas if the 
phoneme is preceded by silence, the voice activity 
detector will heavily clip the beginning of the 
phoneme causing substantial distortion, 

20 Analyzer 8 

The analysis according to the present invention 
is intended tc provide an acceptability signal output 
which depends upon the distortion cf the test signal 
similarly tc the response of a human ear, as it is 
25 presently understood. 
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Without dwelling upon the physical or biological 
mechanisms giving rise to these phenomena, it is well 
known that the human perception of sound is affected 
by several factors . Firstly the presence of one sound 
5 "masks" (i.e. suppresses the perception of) another 

sound in a similar spectral (frequency) region. The 
extent to which the other sound is masked depends 
upon, firstly, how close in pitch it is to the first 
sound and, secondly, to the amplitude of the first 

10 sound. 

Thus, the human perception of errors or 
distortions in a sound depends upon the sound itself; 
errors of low amplitude in the same spectral region as 
the sound itself may be masked and correspondingly be 

15 inaudible (as, for example., occur with quantising 

errors in sub band coding) . 

Secondly, the masking phenomenon has some time 
dependence. A sound continues to mask other sounds 
for a short period after the sound is removed; the 

20 amplitudes of the subsequent sounds which will be 

masked decays rapidly after the removal of the first 
sound. Thus, errors or distortions will be masked not 
only by the present signal but also by portions of the 
signal which preceded it (to a lesser extent) . This is 

25 referred to as "fcrward masking". It is also found 

that the application cf a high level sound just after 
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a lower level sound which would otherwise have been 
audible retrospectively makes the earlier sound 
inaudible. This is referred to as "backward masking " . 
Thirdly, the human ear is not directly responsive 
5 to the frequency, but to the phenomenon perceived as 

"pitch" of a sound, which corresponds to a nonlinear 
warping of the frequency axis. 

Fourthly, the human ear is not directly 
responsive to amplitude, even when a signal is not 

10 masked, but to the phenomenon perceived as loudness 

which is a nonlinear function of amplitude. 

Accordingly, in this embodiment the analyzer 8 is 
arrianged to process the signal received from the 
telecommunications equipment 1 to determine how 

15 significant or objectionable the distortion produced 

thereby in the test signal will be to a human 
listener, in accordance with the above known 
characteristics of the human ear. 

More particularly, the analysis unit 8 is 

20 arranged to determine what the response of the human 

ear will be to the test signal generated by the test 
signal generator 7; and then to similarly process the 
signal from the telecommunications apparatus output 3 
tc determine the extent to which it perceptibly 

25 differs from the original test signal, by determining 

the extent tc which distortions are perceivable. 
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Figure 5a shows schematically the variation of 
the spectral masking threshold (the threshold above 
which a second sound is obscured by a first) for 
narrow band noise at a fixed frequency. The five 
5 curves are for progressively higher levels of masking 

noise, and it will be seen that the effect of 
increasing the level of masking noise is to cause a 
roughly linear increase in the masking threshold at 
the masking noise frequency, but also to change the 

10 shape of the threshold away from the noise frequency 

(predominantly towards higher frequencies) . The 
masking effect is therefore amplitude nonlinear with 
respect to the amplitude of the masking noise. 

For a given masking noise level , the width 

15 (measured, for example, at the 3 dB points below the 

central masking frequency) of the masked spectral band 
varies with the frequency of the masking noise. This 
variation of the width of the masked bands is related 
to the characteristic of the human auditory filter 

20 shape for frequency discrimination, and therefore to 

the human perception of pitch. 

Accordingly, as shown in Figure 5b, a scale of 
pitch, rather than frequency, can be generated from 
the frequency scale by warping the frequency scale, so 

25 as to crete a new scale in which the widths of masking 

bands are constant . Ficure 5b shows the critical band 
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. rate, or 3ark, scale which • is derived by considering 
a set of narrow band masking tones at different 
frequencies which cross at the -3 d3 point- This scale 
is described, for example, in "Audio Engineering and 
5 Psychoacoustics : Matching Signals to the Final 

Receiver Common, the Human Auditory System", J . Audio 
Eng. Soc. Vol, 39, March 1991, Zwicker and Zwicker. 

The critical bands shown in Figure 5b are similar 
in shape (on the frequency axis) below 500 hertz when 
10 represented on a linear frequency scale. Above 500 

hertz, they- are similar in shape when viewed on a 
logarithmic frequency scale. Since the telephony band 
width is typically 300 to 3150 hertz, and 
telecommunications apparatus is often band limited to 
15 between these limits, the transformation to the pitch 

scale in this embodiment ignores the linear region 
below 500 hertz with only a small compromise in 
accuracy. 

Referring to Figure € the analysis unit 8 
2 0 comprises an analog to digital converter (ADC) 81 

arranged to receive signals from the input port 6 and 
produce a corresponding digital pulse train; an 
arithmetic processor 32 (for example, a microprocessor 
such as the Intel 80486 processor, cr a digital signal 
25 processing device such as the Western Electric DSP 32C 

cr the Texas Instruments TMS C30 device) , couoied to 
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receive the digital output of the ADC. 81, a memory 
device 83 storing instruction sequences for the 
processor 82 and providing working memory for storing 
arithmetic results, and an output line 84 from the 
5 processor 82 connected to the output 10. 

Referring to Figure 7, the processes performed by 
the processor 82 in this embodiment will now be 
described. 

Firstly, the test signal supplied from the test 
10 signal generator 7 is input directly to the input port 

6 in a step 100, without passing through 

telecommunications apparatus 1. 

In the next step 101, the signal from the ADC 81 

is filtered by a filter which corresponds to the 
15 transfer function between the outer portions of the 

ear and the inner ear. The filtering may typically be 

performed by executing a digital filtering operation 

in accordance with filter data stored in the memory 

83. The filter may be characterised by a transfer 
20 function of the type described in "Psychoacoustic 

models for evaluating errors in audio systems", J.R. 

Stuart, Procs. I0A, vol. 13, part 7, 1991. 

In fact, the transfer function to the inner ear 

will vary slightly depending, upcn whether the sound is 
25 coupled closely tc the ear '.e.g. through a headset) or 

mere distantly ve.g. from a loudspeaker) ; accordingly, 
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the processor 82 and store 83 may be arranged to store 
the characteristics of several different transfer 
functions corresponding to different sound locations 
related to the type of telecommunications apparatus 1 
5 on test, and to select an appropriate filter in 

response to a user input specifying the 
telecommunications apparatus type . The filtered signal 
after the execution of the step 101 refers corresponds 
to the signal as it would be received at the inner 
10 ear. 

Next, in a step 102, the signal is split into a 
plurality of spectral bands having bandwidths which 
vary logarithmically with frequency so as to effect 
the transformation from frequency to pitch. In this 
15 embodiment, the signal is bandpass filtered into 20 

bands each one -third of an octave in bandwidth, from 
100 hertz to 8 kilohertz, according to International 
Standard ISO S32B; the ISO band filters are similar in 
shape when viewed on a logarithmic frequency axis and 
20 are well known and documented. The average signal 

amplitude in each of the 20 bands is calculated each 
4 milliseconds, and the signal after filtering thus 
comprises a series of time segments each comprising 20 
frequency band amplitude values. This bandpass 
filtering is performed for ail the values in the test 
signal (which lasrs cn the order of several seconds. 



25 
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for example, 10 seconds) . 

The relatively wide filters take account of the 
masking within each filter band, and the broad, 
overlapping skirts of the filters ensure that spectral 
5 masking due to neighbouring frequencies is also taken 

account of . 

Next, in step 103, frequency dependent auditory 
thresholds specified in International Standard ISO 22 6 
are applied to each of the band outputs- This 

10 simulates the effect of the minimum audibility 

threshold indicated in Figure 5a. 

Next, in step 104, the bandpass signal amplitudes 
are converted to a phon or sensation level which is 
more equivalent to the loudness with which they would 

15 be perceived by a human auditory system. The 

conversion is non-linear, and depends upon both signal 
amplitude and frequency. Accordingly, to affect the 
conversion, the equal loudness contours specified in 
international standard ISO 22 6 are applied to each of 

20 the band outputs. Both these equal loudness contours 

and the thresholds used in step 103 are stored in the 
memory 83 . 

Next, in step . 105, a temporal masking 
i specifically forward masking) is performed by 
25 providing an exponential decay after a significant 

uraii-ude value. In fact, the rate cf decay of the 
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masking effect depends upon the time of application of 
the masking sound; the decay time is higher for a 
longer time of application than for a shorter time. 
However, in this embodiment, it is found sufficient to 
apply a fixed exponentially weighted decay, defined by 
y a 56.5 * 10~(-0.01x), (where y represents level and 
x represents time) which falls between the maximum 
decay (corresponding to over 200 milliseconds 
duration) and the minimum decay (corresponding to 5 
milliseconds duration) encountered in practice. 

In applying the forward masking, at each time 
segment for each bandpass filter amplitude, masking 
values for the corresponding bandpass in the three 
following time segments are calculated, using the 
above exponential decay. The three values are 
compared with the actual amplitudes of those bands, 
and if higher than the actual amplitudes, are 
substituted for the actual amplitudes . 

As noted above, it is also possible for a sound 
to mask an earlier "occurring sound (so called 
"backward masking" ) . Preferably, in this embodiment, 
the forward masking process is replicated to perform 
backward masking, using the same type of exponential 
decay,- but with different numerical constants (in 
other words, for each time segment, values of masking 
for earlier occurring ti:?.e segments are calculated. 
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and if higher than the actual amplitudes for those 
bands, are substituted for the actual amplitudes). 

Thus, after step 105 the calculated signal data 
comprises a succession of time segment data each 
5 comprising 20 bandpass signal amplitudes, threshoided 

so that some amplitudes are zero, and the amplitude of 
a given band in a given time segment being dependent 
upon the amplitudes of corresponding bands in past and 
future time segments due to the forward and backwards 

10 masking processing. 

This corresponds to a surface indicating, along 
the signal pitch and time axes, the masking effect 
which the test signal would have had upon the human 
ear if directly applied without the telecommunications 

15 apparatus 1 . 

Referring to Figure 8, Figures 8a and 8b show 
excitation surfaces generated by the above process. 
Figure 8a corresponds to a speech event comprising a 
voiced sound followed by an unvoiced sound; the 

20 formant structure of the first sound and the broad 

band nature cf the second sound can readily . be 
distinguished. Figure 5b shows a corresponding surface 
for two clicks, and the effect of the forward masking, 
stage 105 cf Figure 7 is clearly visible in the 

25 exponential decays therein. 

Next, in srep IG€, the test signal generator 7 
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repeats the test: signal but this time it is supplied 
to the input pore 2 of the telecommunications 
apparatus 1, and the output port 3 thereof is 
connected to the input port € of the test apparatus 4. 
5 The calculation stages 101 - 105 are then repeated, to 

calculate a corresponding surface for the received 
signal from the telecommunications apparatus 1 . 

Having calculated . the effect on the ear 
(excitation) of the original test signal and of the 

10 output from the telecommunications apparatus (the 

distorted test signal) , the difference in the extent 
to which the two excite the ear corresponds to the 
level of distortion of the test signal as perceived by 
the human auditory system. Accordingly, the amplitude 

15 transfer function of the telecommunications apparatus 

is calculated, for each segment, by taking the ratio 
between the corresponding bandpass amplitudes (or 
where, as in Figure 8a or 8b, the bandpass amplitudes 
are represented on a dB scale, by taking the 

20 difference between the amplitude in dBs) . To avoid an 

overall gain term in the transfer function, which is 
irrelevant to the perceived distortion produced by the 
telecommunications apparatus, each bandpass term may 
be normalised by dividing (or, when represented in - 

25 dBs, subtracting) by zhe average amplitude over all 

bandpass filter cuzputs over all time segments in the 
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test signal sequence, in step 107. 

If the original test signal and the output of the 
telecommunications apparatus 1 are identical, but for 
some overall level difference (that is to say, if the 
5 telecommunications apparatus 1 introduces no 

distortion) , the ratio between each bandpass filter 

output of the two signals will be unity, and the 
logarithmic difference in dBs in amplitude will be 
zero; accordingly, the corresponding difference plot 

10 to Figure 8a or Figure 8b would be completely flat at 

all times and in all pitch bands . Any deviation is due 
to distortion in the telecommunications apparatus. 
Additive distortion errors will appear as peaks, and 
signal loss will appear as troughs, relative to the 

15 undistorted average level. 

The perceptual significance given to these errors 
is not directly dependent upon their amplitude, but 
rather upon loudness which is a nonlinear function of 
amplitude, and a function cf frequency. Calculation of 

20 the perceptual loudness is given in International 

Standard ISO 532B. However, this specification applies 
to binaural sound, and for monaural sound (as commonly 
found in telecommunications applications) it is 
possible to use .a simpler calculation . of loudness 

25 based on the established monaural telephony perceptual 

weiahzincs fcr loudness aiven in CCITT Recommendation 
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P79 (Blue Book Volume V, Melbourne 1988, CCITT) . This 
method of estimating the error loudness takes account 
of the fact thai: errors at some frequencies are 
perceived more easily, and are hence given greater 
5 weighting, than those at other frequencies . For each 

time segment in the signal sequence, in this 
embodiment, an error magnitude is calculated as : 

14 

ErrLoud. = 0.8 £ Sr. * 10 <-o . oi75*wsn) 
10 ~ n=l 

Where: ErrLoud*. error loudness at time t (+ve and 

-ve parts calculated separately) 
n nth l/3rd octave band from 200 Hz 

to 4kHz 

15 ER n error amplitude in dB. 

W Sn SLR weighting for the nth 

frequency 

for a narrow band model of the error extending between 
200 hertz and 4 kiiohertz, where the Weighting 
20 coefficients derived from the P7 9 Recommendation are 

as shown in Figure 9 . 

F03P a broad band telephony model making use of 
all 2 0 band path outputs, the corresponding error 
loudness is calculated as : 



^2 



ErrliOUcL. =1.25 Z Er_ * 10 



(-0 .C175*WSn) 
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In this case, the value of n covers all 2 0 bands from 
100 hertz to 8 kilohertz. 

The additive error (positive errors) and short 
fall errors (negative error values) are separately 
5 cumulated to give positive and negative subtotals* 

As shown in Figure 10, the result of the 
calculation stage 109 is a time sequence of time 
segment error loudness values. In step 110, in this 
embodiment , the acceptability or otherwise of the 

10 telecommunications apparatus is found directly from 

the data shown in Figure 9, by taking for example the 
peak error loudness value and/or average error 
loudness value. One or both of these criteria are then 
output as the measure of distortion of the 

15 telecommunications apparatus 1 to the output port 9 in 

step ill. 



Second Embodiment 

In the second embodiment, the analysis unit 8 is 
the same or similar to that in the first embodiment. 
20 However, the test signal generating unit 7 does not 

utilise the F50 test signal, but instead generates a 
different type of artificial, speech-like test signal. 

Whilst the F5 0 test signal is acceptable for many 
purposes, it is observed to lack a full range of 
25 fricative sounds. Furthermore, it has a rather regular 
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and monotonous long term structure, which sounds 
rather like a vowel -consonant -vowel -consonant ... 
sequence. As discussed above, however, since many 
telecommunications systems include time dependent 
5 elements such as automatic gain controls or voice 

switches, the distortion applied to any given portion 
of the test signal is partly dependent upon the 
preceding portion of the test signal; in other words, 
the context of that portion of the speech signal 
10 within the time sequence of the signal as a whole. 

Accordingly, in this embodiment, a small, 
representative, subset of speech segments (selected 
from the tens of known phonemes) is utilised, and a 
test signal is constructed from these sounds assembled 
15 in different contextual sequences. Since distortion is 

being measured, it is more important that the test 
sequence should include successions of sounds which 
are relatively unlike one another or, more generally, 
are relatively likely to cause distortion when one 
20 follows another. In a simpler form of this embodiment, 

the test signal might comprise each of the selected 
segments prefixed by a conditioning portion selected 
from a high, low or zero level, so that the test 
signal enables each representative speech segment 
25 ;phoneme) to be -ested following prefixed sounds of 

different levels. The length of the prefixing signal 
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is selected to extend over the time constants of the 
system under test; for example, codec adaptation and 
active gain control takes on the order of a few 
seconds, whereas speech transducer transient response 
5 is on the order of a few milliseconds. 

Further details of this embodiment are to be 
found in our earlier filed UK Patent Application No. 

93 {Agents Ref : A24613) , filed on 21 June 1993, 

entitled "Speech-like test stimulus" the contents of 
10 which are incorporated herein by reference in their 

entirety. The test signal of this embodiment could 
also be utilised with conventional analysis means. 



Th,jr^ Embodiment 

In a third embodiment cf the invention, the test 
15 signal generator 7 operates in the same manner as in 

the first or second embodiments. However, the 
operation of the analysis unit 8 differs in steps 102 
to 110. 

Although the logarithmically spaced filters, of 
20 the first embodiment are found to be a reasonable 

approximation to the pitch scale of the human ear, it 
is found that an even better performance is given by 
the use cf filters which are evenly spaced, on a Bark 
scale (as discussed above; . Accordingly, in step 102, 
25 trie twenty bandpass filters are rounded exponential 
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(roex) filters spaced at one Bark intervals on the 
pitch scale. The round exponential function is 
described in "Suggested formulae for calculating 
auditory- filter bandwidths and excitation patterns", 
5 (J. Acoust.Soc.Am. 74, 750-753 1983) , B.C.J. Moore and 

M.R Glasburg. 

Rather than calculating the average signal 
amplitude in each band every four milliseconds, in 
this embodiment, the signal amplitude is calculated 

10 over different averaging periods for the different 

bands, averaging over two milliseconds for the highest 
pitch band and 48 milliseconds for the lowest pitch 
band, with intervening averaging times for the 
intervening bands. It is found that varying the 

15 temporal resolution in dependence upon the pitch (or, 

in general, the frequency) so as to resolve over a 
longer interval at lower frequencies gives a 
substantially improved performance . 

For subsequent processing, as before, for each 

20 two millisecond time segment, an array of bandpass 

filter output values are generated. For bands lower 
than the highest pitch r values are repeated more than 
cries for intervening time segments (for example, for 
the lowest pitch band, each value is repeated 24 times 

25 tzr the two millisecond time segments between each 48 

-iiliseccnd average arnolitude value) . It would, of 
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course, be possible to perform a numeric interpolation 
between succeeding values, rather than merely 
repeating them. 

The steps 103-106 are the same as in the first 
5 embodiment (with the adjustment of numerical constants 

to reflect the different filter responses) . 

In this embodiment, rather than calculating the 
loudness of the distortion, a different test measure 
which is more closely related to the subjective 
10 "listening effort" measure Y LE are derived. 

The sequence of sets of bandpass auditory 
excitation values (corresponding to a surface along 
the time and pitch axes) is divided into contiguous 
sectors of length 96 milliseconds (i.e. 48 successive 
15 2 millisecond segments) so as to include at least two 

different- values for the lowest pitch band. The total 
amount of error or error activity, is calculated as: 

48 20 

Error Activity, E A =lClogf£ £ jc(i, j) \ 

where c(i,j) is the error value in the i th time segment 

20 and j th pitch band cf the errcr surface sector to be 

analyzed. 

This gives an indication cf the absolute amount 
cf distortion present . 

Then, the distribution cf the errcr over time and 
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pitch (or rather, the entropy of the distortion, which 
corresponds to the reciprocal of the extent to which 
the energy is distributed) is calculated as follows: 

48 20 

Error entropy, a{i, j) *ln(a (i, j) ) 

5— where a ( i , j ) = (c( l' J>)l 

The log term in the above expression controls the 
extent to which the distribution of energy affects the 
entropy _ E E J acting as a non- linear compression 
function. 

10 It is found that the error activity and error 

entropy criteria together correspond well to the 
subjectively perceived level of distortion, as the 
listener will find a high level of error considerably 
more noticeable if it is concentrated at a single 

15 pitch over a short period of time, rather than being 

distributed over pitch and time. Accordingly, in this 
embodiment, as shown in Figure 12, rather than 
calculating loudness in the step 109 of Figure 7b, a 
step 119 of calculating the amount and distribution 

20 (the activity and entropy) of the distortion is 

performed. 

In the step lie, the two measures may separately 
be subjected tc thresholds, or they may be combined 
and the combined measure thresholded. For example, 
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they may be summed or multiplied together with 
appropriate weightings in a step 12 0 as shown in 
Figure 12 . 



TPrvnT-t-Vi "PSn ^odiineiit 
5 In this embodiment the speech signal may be 

generated according to either the first embodiment or 
the second embodiment. However, the analysis unit 8, 
rather than performing the above -described masking 
calculations, directly simulates the human ear, as 

10 described for example in "Digital Filter Simulation of 

the Basilar Membrane", Computer Speech and Language, 
No . 3 1989 , Ambikairajh, Black, and Linggard 
(incorporated herein in its entirety by reference). 
Such a model will receive as input the signal from the 

IS ADC 81, and generate a series of outputs at each time 

segment which corresponds to the effects on parts of 
the human hearing structure of the distorted signal 
from the telecommunications apparatus 1. The outputs 
of the model are then combined by appropriate 

20 processing and decision logic (for example, a neural 

network cr a fuzzy logic controller) based on 
• empirically derived correlation with actual listener 
responses tc provide a signal indicating the 
perceptual significance of the distortion in the 

25 sicmai . 
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Aspects of the analysis method of this embodiment 
could also be used with other test signals (for 
example, real human speech) . 



Effects of the Invention 
5 Referring to Figures 11 and 13 to 16, the 

representation of various types of telecommunications 
apparatus distortion of the test signal of Figure 8a 
by the first and second embodiments of the invention 
will now be illustrated . 

10 Figure. 11a shows the error excitation surface 

produced by instantaneous amplitude distortion 
produced by adding low amplitude second and third 
order terms to the signal . The distortion was 
characterised as "barely audible" by a human listener. 

15 It will be seen that the error loudness figures are 

small and mostly positive, as shown in Figure lib. 

Figure 13 a shows the corresponding error 
amplitude surface for fully audible nonlinear 
distortion of the same type, but with higher value 

20 second and third order terms. The amplitude of the 

error and the error loudness (Figure 13b) are both 
much larger. Additionally/ it will, be seen that the 
majority of the distortion loudness coincides with the 
voiced parr cx the test signal of Figure 8a, since 

25 this contains low frequency formant tones whose 
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harmonics are perceptually significant. 

Referring to Figures 14a and 14b, the effects of 
modulated noise reference unit (MNRU) distortion are 
shown. MNRU distortion is described in Annex A of 
5 CCITT Recommendation P81, and is designed to be 

theoretically to the distortion introduced by a single 

A Law PCM stage (of the kind widely used in 
telecommunications systems) . The level of distortion 
was characterised as fully audible by a human 

10 listener.. Again, it will be seen from Figure 14a that 

the perceptual distortion is associated chiefly with 
formants in the voiced part of the test signal . 

Referring to Figures 15a and 15b, when crossover 
distortion is supplied ( i.e. distortion of the kind 

15 y = mx -r c for x greater than zero and y = mx - c for 

x less than zero) low amplitude signals are not 
transmitted, and so the lower energy unvoiced sound in 
the second part of the test signal is drastically 
attenuated. Figures 15a and 15b therefore suggest a 

20 very significant subjective impact of this kind of 

distortion, which corresponds with the reaction of the 
human listener. 

Finally Figures 16a and ISb illustrate the 
ef facts . ..of a voice activity detector with a 50 

25 millisecond onset time. In the initial part of the 

signal, there is a large "negative errcr loudness 
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because the signal has been clipped. The following 
positive error loudness is due to overshoot or 
settling. The error loudness values indicate a high 
level of perceived distortion, which coincides with 
5 the reaction of the human listener. 

Other Alternatives a nd Modifi cations 

It will be clear from the foregoing that many 
variations to. the above described embodiments can be 
made without altering the principle of operation of 

10 the invention. For example, if the telecommunications 

apparatus is arranged to receive a digital input, the 
DAC 71 may be dispensed with. The signal from the 
output port 5 could be supplied in digital form to the 
input port 2 of the telecommunications apparatus and 

15 the ADC 81 may likewise be dispensed with. 

Alternatively, an electro-mechanical transducer could 
be provided at the output pert 5 and the signal 
supplied as an audio signal. In the latter case the 
test signal may be supplied via an artificial mouth as 

20 discussed in CCITT P. 51 Recommendation on Artificial 

Ear and Artificial Mouth, Volume 5 , Rec P. 51, 
Melbourne 198 8 and earlier UK patent application 
GB221S300 (873 0347; , both incorporated herewith by 
reference. Similarly, the distorted speech signal 

25 ccuid be received via an artificial ear acoustic 
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structure as described in the above CCITT 
Recommendation and our earlier UK patent application 
GB22182 99 (873 0346) incorporated herein by reference. 
This would reduce the filtering needed in the step 
5 101. 

Although in the above described embodiments, a 
single decay profile for temporal masking is 
described, it may be preferred in alternative 
embodiments of the invention to provide a plurality 

10 (for instance 2) of decay rates for forward (and 

backward) masking, and to select the required decay 
rate in dependence upon the duration of the masking 
sound (i.e. the number of time segments over which the 
amplitude in one cf the passbands exceeds a 

15 predetermined level) . For example, maximum and 

minimum decays (corresponding to 200 milliseconds and 
5 milliseconds duration respectively, may be defined 
by; 

y = 58.4039 * 10" ( -0 . 0059x) 
20 y = 55.5955 * 10" ( -C . 0163x) 

Although connections to an actual' 
telecommunications apparatus have been . described 
herein, it would equally be possible to programme a 
comout-nc: actaratus tt simulate the distortions 



qki cnnn i n <wn 9400922A 1 J_> 



# 



WO 94/00922 PCT/GB93/01322 

35 

introduced by telecommunications apparatus, since many 
such distortions are relatively easy to characterise 
(for example, those due to VADs or codecs) . 
Accordingly, the invention extends likewise to 
5 embodiments in which a signal is supplied to such 

simulation apparatus, and the simulated distorted 
output of the telecommunications apparatus is 
processed. In this way, the acceptability to a human 
listener of the combination of many complicated and 
10 1 nonlinear communications apparatus may be modelled 

prior to assembling or connecting such apparatus in 
the field. 

Although the analysis unit 8 and test signal 
generator 7 have been described as separate hardware, 
15 in practice they could be realised by a single 

suitably processed digital processor; likewise, the 
telecommunications apparatus simulator referred to in 
the above embodiment could be provided by the same 
processor. 

20 Although in the above described embodiments the 

analyzer unit 8 receives and analyses the test signal 
from the text signal generator 7, in practice the 
analyzer unit 8 could store the excitation data 
previously derived for the, or each of several, test 

25 sequences by an earlier analysis. Thus, the analyzer 

unit in such embodiments need not be arranged itself 
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In the above described embodiments, other 
measures of the signal distortion than the error 
loudness, error activity or error entropy may readily 
5 be derived from the calculated data corresponding to 

Figures 11a, 13a, 14a, 15a and 16a. In fact, loudness 
of the distortion is only one of the measures of its 
effect on a human listener; others are listener 
fatigue and listening effort. For example, the 

10 distortion or error data calculated according to the 

above described embodiments may be employed as inputs 
to a statistical classifier, neural network, or fuzzy 
logic engine, operating in accordance with parameters 
derived empirically by comparative tests with genuine 

15 human listeners. 

In this document, for convenience, the term 
"phoneme" is used, for convenience, to indicate a 
single, repeatabie, human speech sound, 
notwithstanding that in its normal usage a "phoneme" 

20 may denote a sound which is modified by its speech 

context . 

Unless the reverse is indicated or apparent, the 
features of the above embodiments may be combined in 
manners ether than those explicitly detailed herein . 
25 Although the embodiments described above relate 

to testing telecommunications apparatus, the 
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application cf novel aspects of the invention to other 
testing or analysis is not excluded. 

Accordingly, protection is sought for any new 
matter or combination of new matter disclosed herein, 
5 together with variations thereof which would be 

apparent to the skilled reader, whether or not such 
matter or variations are within the scope of the 
following claims . 
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CLAIMS : 



1. Telecommunications testing apparatus 
comprising a signal generator (7) for supplying a :est 
signal which has a spectral resemblance to human 

5 speech but does not correspond to a single speaker 

conveying intelligent content, and analysis means (8) 
for receiving a distorted signal which corresponds to 
said test signal when distorted by telecommunications 
apparatus (1) to be tested, and for analyzing said 
10 distorted signal to generate a distortion perception 

measure which indicates the extent to which the 
distortion of said signal will be perceptible to a 
human listener. 

2. Apparatus according to claim I , in which the 
15 analysis means (8) is arranged to estimate the effect 

which would be produced on the human auditory system 
by said test signal, and to estimate therefrom the 
effect which would be produced cn the human auditory 
system by said distortion. 

20 2. Apparatus according to claim 2, in which the 

analysis means (8) is arranged to estimate the effect 
which would be produced on the human auditory system 
by said distorted signal,.. and_tc determine the 
difference between the said effect and that due tc the 

25 test signal, and tc generate said distortion 
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perception measure in dependence upon said difference. 

4. Apparatus according to any preceding claim, 
in which the analysis means (8) is arranged so as to 
generate said distortion perception to depend upon 

5 perceptual loudness of said distortion, and to depend 
nonlinearly upon the amplitude of said distortion. 

5. Apparatus according to any preceding claim, 
in which the analysis means (8) is arranged to 
generate a plurality of spectral component signals of 

10 said test signal and/cr said distorted signal. 

6. Apparatus according to claim 5, in which the 
component signals have different bandwidths. 

7. Apparatus according to claim 6 in which the 
component signal bandwidths are selected so they 

15 correspond to equal masking amplitudes for signals 

centred within each band. 

8 . Apparatus according to claim € or claim 7 , 
in which the component signal bandwidths are, on a 
logarithmic frequency scale, approximately equal. 

20 9. Apparatus according to claim 6 or claim .7 in 

which the component signal bandwidths are, on a Bark 

scale, approximately equal. 

10. Apparatus according to any of claims 5 to 9, 

in which the analysis means (5) is arranged to 
25 estimate, fcr each spectral component signal, the 

masking effect which that spectral component signal 
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would produce on the human ear. 

11 . Apparatus according to any preceding claim, 
in which said analysis means (8) is arranged to 
estimate the effect which said distortion would 

5 produce on the human ear taking into account the 

temporal persistence of said effect . 

12. Apparatus according to any preceding claim, 
in which the analysis means (8) is arranged to 
generate a time sequence of successive processed 

10 signal segments from said test signal and/or said 

distorted signal, the value of at least some signal 
segments being generated in dependence upon portion of 
said test signal and/or distorted signal which precede 
and/or succeed said signal segments . 

15 13. Apparatus according to claim 1, in which the 

analysis means (8) is arranged to decompose the 
distorted signal into a plurality of spectral 
component bands, the spectral component bands having 
bandwidths approximately equally spaced in pitch and 

20 being shaped to provide spectral masking; to calculate 

the temporal masking of the signal due to preceding 
and/ or succeeding temporal portions thereof; to form, 
• for each of the spectral component signals, a 
representation of the difference, between the component 

25 signal of the distorted signal and a correspondingly 

calculated component of the test signal; and to 
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generate said distortion perception measure from said 
difference measure . 

14 . Apparatus according to claim 13 in which the 
analysis means (8) is arranged to generate a measure 

5 of the amount of distortion from said difference 

signal . 

15 . Apparatus according to claim 13 or claim 14 
in which the analysis means (8) is arranged to 
generate a means of the spectral and temporal 

10 distribution of the distortion from said difference 

signal . 

16 . Apparatus according to claim 13 in which the 
analysis means (8) is arranged to form a weighted sum 
of the spectral component differences, weighted in 
15 accordance with the relative loudness of sounds of 

reference amplitude at pitches corresponding to said 
component signals, and to generate said distortion 
perception measure in dependence upon said weighted 
sum . 

20 17. Apparatus according to any preceding claim, 

in which said analysis means (8) is arranged . to 
perform a spectral decomposition of- said test and/or 
distorted signals -into a- plurality of spectral 
component signals ^iii jie^oinpoaition generating, for 

25 each spectral component signal, a time sequence of 

spectral component values each representative of 
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component signal values over a time interval, the time 
interval for lower frequency component signals 
exceeding that for higher frequency component signals . 

18. Apparatus according to any preceding claim, 
5 in which the analysis means (8) is arranged to filter 

said test signal and/or said distorted signal in 
accordance with a filter calculated to correspond to 
the transfer function of portions of the human 
auditory system between the telecommunications 
10 apparatus and the inner ear. 

19. Apparatus according to claim 18, in which 
the analysis means is arranged to be capable of 
selecting one of a plurality of different said 
transfer functions corresponding, respectively, to 

15 different telecommunications apparatus . 

20. Apparatus according to any preceding claim 
further comprising an artificial ear structure for 
receiving said distorted signal as an acoustic signal 
and for acoustically processing said distorted signal 

20 prior to analysis by the analysis means (8) . 

21. Apparatus according to any preceding claim, 
in which the signal generator (7) further comprises an 
artificial mouth structure for receiving said test 
signal from tie signal generator in acoustic form and 

25 for acoustically processing said test signal prior to 

supply tc said telecommunications apparatus . 
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22. Apparatus according to any preceding claim, 
in which the signal generator (7) comprises a digital 
store for storing speech data, and means (72) for 
reconstituting a speech signal from the stored speech 

5 data. 

23. Apparatus according to claim 22, in which 
the stored speech data comprises digitised sound 
samples and the reconstituting means (72) comprises a 
digital to analog convertor. 

10 24. Apparatus according to claim 22, in which 

the store (71) is arranged to store parameters for 
control of a voice synthesizer comprising said means 
for reconstituting the speech signal . 

25. Apparatus according to any preceding claim, 
15 in which the signal generator (7) is arranged to 

generate a test signal which comprises a sequence 
formed of a predetermined, small, number of speech 
segments (e.g. smaller than the number of commonly 
occurring human speech phonemes) , the speech signal 

20 comprising several different portions including said 

segments such that each segment is represented in 
several different temporal contexts within said 
sequence, so as to vary the effect on each segment of 
time varying distortions in - the telecommunications 

25 apparatus . 

26. Apparatus according to claim 25, in which 
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the test signal generator (7) is arranged to vary the 
context., for different said speech segments by 
prefixing said segments with predetermined signal 
portions of several different level within a sequence 
5 of said test signal. 

27. Apparatus according to claim 25 in which 
said segments are present in a plurality of different 
combinations within said sequence. 

28. A method of testing telecommunications 
10 apparatus comprising analyzing a speech- like test 

signal which has a spectral resemblance to human 
speech but does not correspond to a single speaker 
conveying intelligent content, as distorted by said 
telecommunications apparatus; determining the extent 
15 to which the distortion of said signal will be 

perceptible to a human listener; and generating a 
distortion perception measure indicative of said 
determined extent. 

29. A method according to claim 28, comprising 
20 generating said test signal, passing said test signal 

through said telecommunications test apparatus, and 
analyzing the distorted signal produced at the output 
of said telecommunications test apparatus. 

30. A me tried according to claim 28 or 29, 
25 further comprising analyzing the extent to which said 

test signal would be perceptible to a human listener, 
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deriving a measure of the difference between the 
perception of the test signal and of the distorted 
signal, and deriving said distortion perception 
measure in dependence upon said difference . 
5 31. Telecommunications testing apparatus 

comprising a signal generator (7) for supplying a test 
signal, and analysis means (8) for receiving a 
distorted signal which corresponds to said test signal 
when distorted by telecommunications apparatus to be 

10 tested, for decomposing distorted signal into a 

plurality of spectral component bands, the spectral 
component bands having bandwidths approximately evenly 
spaced in pitch; to calculate the temporal masking of 
the distorted signal; to form for each of the spectral 

15 component signals, the difference between the 

distorted signal and a correspondingly calculated test 
signal component for a plurality of successive 
temporal portions of the test signal; and to generate 
a distortion perception measure which indicates the 

20 extent to which the distortion of said signal will be 

perceptible to a human listener by deriving a measure 
of the said spectral component differences ovfer a 
plurality of said temporal portions and said spectral 
components, and a measure of the distribution of said 

25 differences over said temporal portions and spectral 

components . 
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32. Telecommunications testing apparatus 
comprising analysis means (8) for receiving a 
distorted signal which corresponds to said test signal 
when distorted by telecommunications apparatus (1) to 

5 be tested, and for generating a plurality of spectral 

component signals of said distorted signal, each said 
spectral component signal comprising a time sequence 
of successive spectral component values representative 
of component signal levels over a time interval, the 
10 time intervals being longer for lower frequency 

spectral component signals than for higher frequency 
spectral component signals . 

33. Apparatus according to claim 32, in which 
the component signals have different bandwidths. 

15 34 . Apparatus according to claim 33 in which the 

component signal bandwidths are, on a Bark scale, 
approximately equal. 
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